All the material presented here, to the extent it is original, is available under CC-BY-SA.
09:00-12:00 Spatial autocorrelation and machine learning; Point pattern analysis; Transport, networks, transmission
13:00-15:45 Project consultations
Depending on the number of participants wishing to present project outlines (about 20 minutes, 10-ish minutes presentation, 10-ish minutes discussion, the number of slots may vary.
09:00-12:00 Presentation of about seven project outlines, with 20 min. break.
12:30-15:30 Presentation of about seven project outlines, with 20 min. break.
15:30-16:00 Round-up and feedback
https://spatialsample.tidymodels.org/index.html https://mlr3spatial.mlr-org.com/ https://mlr3spatiotempcv.mlr-org.com/ https://cran.r-project.org/package=blockCV
What we see on a map is a pattern, or perhaps some patterns mixed together.
It is not easy to work back from map pattern to the process or processes that generated it/them.
Using a variety of approaches, we can explore and analyse point patterns, also reviewing an important chapter in the development of quantitative geography.
Practically, we will also see how we can try out different approaches, and how their assumptions affect our conclusions.
David O’Sullivan and David Unwin (2003) , Wiley, chapter 4, plus chapter 3 for the curious (or 2010, ch. 5 plus ch. 4);
Ian Smalley and David Unwin (1968) The formation and shape of drumlins and their distribution and orientation in drumlin fields, , 7, pp. 377–390; Alan R. Hill (1973) The distribution of drumlins in County Down, Ireland, , 63 (2). pp. 226–240.
Others may also like Trevor Bailey and Anthony Gatrell (1995) , Longman, chapter 3.
library(sf)
## Linking to GEOS 3.11.0, GDAL 3.6.0, PROJ 9.1.0; sf_use_s2() is TRUE
drumlins <- st_geometry(st_read("drumlins.gpkg"))
## Reading layer `drumlins' from data source
## `/home/rsb/und/ecs530/ECS530_h22/drumlins.gpkg' using driver `GPKG'
## Simple feature collection with 232 features and 0 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 0.2804878 ymin: 5.567073 xmax: 8.231707 ymax: 13.4878
## Projected CRS: Undefined Cartesian SRS
A data set similar to the one refered to by O’Sullivan and Unwin on p. 100-101 is available in spatial in R (associated with Venables and Ripley (2002) Modern Applied Statistics with S) — it is the one used by Upton and Fingleton, coded by Ripley. We have here copied the points to a shapefile.
Hill, 1973
Upton & Fingleton, 1985
library(spatstat)
## Loading required package: spatstat.data
## Loading required package: spatstat.geom
## spatstat.geom 3.0-3
## Loading required package: spatstat.random
## spatstat.random 3.0-1
## Loading required package: spatstat.explore
## Loading required package: nlme
## spatstat.explore 3.0-5
## Loading required package: spatstat.model
## Loading required package: rpart
## spatstat.model 3.0-2
## Loading required package: spatstat.linnet
## spatstat.linnet 3.0-2
##
## spatstat 3.0-2
## For an introduction to spatstat, type 'beginner'
(drumlins_ppp <- as.ppp(drumlins))
## Planar point pattern: 232 points
## window: rectangle = [0.280488, 8.231707] x [5.567073, 13.487805] units
Although spatstat and the sp classes have developed independently, they have a good deal in common, and point patterns, images and polygon windows can be exchanged
Point pattern objects need bounding windows to show where the population of data points were collected. The default window is the bounding box of the points, but others are available.
bb <- boundingbox(drumlins_ppp)
ch <- convexhull.xy(drumlins_ppp)
rr <- ripras(drumlins_ppp)
drumlins_rr <- ppp(drumlins_ppp$x, drumlins_ppp$y, window=rr)
plot(drumlins_ppp)
plot(bb, add=TRUE, border="darkgreen", lwd=2, lty=1)
plot(ch, add=TRUE, border="darkred", lwd=2, lty=3)
plot(rr, add=TRUE, border="orange", lwd=2, lty=2)
One legacy approach to point patterns, avoiding the drudge of measuring inter-point distances, has been to divide the study area into quadrats, and count the numbers of points falling into each quadrat. This can take the form of a 2D histogram, or be displayed as an image plot.
qc <- quadratcount(drumlins_ppp)
plot(drumlins, cex=0.8)
t3 <- cbind(expand.grid(x=attr(qc, "xbreaks")[1:5] + diff(attr(qc, "xbreaks"))[1]/2, y=rev(attr(qc, "ybreaks")[1:5] + diff(attr(qc, "ybreaks"))[1]/2)), qc=c(t(qc)))
text(t3[,1], t3[,2], t3[,3], cex=1.2, font=2, col="darkred")
abline(h=attr(qc, "ybreaks"))
abline(v=attr(qc, "xbreaks"))
Chi-squared tests for Complete Spatial Randomness using quadrat counts may seem attractive, but suffer from the same problems as do histogram bins:
quadrat.test(drumlins_ppp)
##
## Chi-squared test of CSR using quadrat counts
##
## data: drumlins_ppp
## X2 = 37.828, df = 24, p-value = 0.07222
## alternative hypothesis: two.sided
##
## Quadrats: 5 by 5 grid of tiles
Just adding one more row and column of quadrats, or switching windows, changes our conclusion:
quadrat.test(drumlins_ppp, nx=6)
##
## Chi-squared test of CSR using quadrat counts
##
## data: drumlins_ppp
## X2 = 41.414, df = 35, p-value = 0.4221
## alternative hypothesis: two.sided
##
## Quadrats: 6 by 6 grid of tiles
quadrat.test(drumlins_rr)
## Warning: Some expected counts are small; chi^2 approximation may be inaccurate
##
## Chi-squared test of CSR using quadrat counts
##
## data: drumlins_rr
## X2 = 34.389, df = 24, p-value = 0.156
## alternative hypothesis: two.sided
##
## Quadrats: 25 tiles (irregular windows)
Density plots use a 2D kernel, in spatstat a Gaussian kernel, to create smoothed histograms avoiding the problems of quadrat counts. The key argument to pass to the density method for point patterm objects is sigma=, which determines the bandwidth of the kernel. Since we can coerce the image objects output by the method to an sp class, we use this to cumulate density values for different values of sigma.
crds <- crds <- st_coordinates(st_sample(st_as_sfc(rr), size=10000, type="regular"))
crds <- list(x=crds[,1], y=crds[,2])
library(raster)
## Loading required package: sp
##
## Attaching package: 'raster'
## The following object is masked from 'package:nlme':
##
## getData
k02 <- as(density(drumlins_rr, sigma=0.2, xy=crds), "RasterLayer")
k04 <- as(density(drumlins_rr, sigma=0.4, xy=crds), "RasterLayer")
k06 <- as(density(drumlins_rr, sigma=0.6, xy=crds), "RasterLayer")
k08 <- as(density(drumlins_rr, sigma=0.8, xy=crds), "RasterLayer")
rB <- brick(k02, k04, k06, k08)
library(stars)
## Loading required package: abind
rB_st <- st_as_stars(rB)
library(tmap)
st_crs(rB_st) <- 32662
st_crs(drumlins) <- 32662
## Warning: st_crs<- : replacing crs does not reproject data; use st_transform for
## that
tm_shape(rB_st) + tm_raster(title="Density") + tm_layout(panel.labels=c("0.2", "0.4", "0.6", "0.8")) + tm_shape(drumlins) + tm_symbols(size=0.25, shape=4)
Narrower bandwidths yield more extreme values, broader bandwidths narrow the interquartile range. From this table, we can see how the change in the bandwidth is affecting the relative differences in our view of the local estimates of intensity.
summary(rB)
## layer.1 layer.2 layer.3 layer.4
## Min. 1.111588e-06 0.05761833 0.3330945 0.6394703
## 1st Qu. 1.686969e+00 2.57274436 2.8860319 3.0973568
## Median 3.623539e+00 3.76488457 3.8278332 3.8357639
## 3rd Qu. 5.361415e+00 4.85315654 4.5950619 4.4103786
## Max. 1.378788e+01 7.59639475 5.7175463 5.3068697
## NA's 7.160000e+02 716.00000000 716.0000000 716.0000000
boxplot(rB)
We can find and plot nearest neighbour distances, finding them with nndist — plotting the empirical cumulative distribution function of the nearest neighbour distances is interesting:
nns <- nndist(drumlins_rr)
summary(nns)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1446 0.2700 0.3203 0.3254 0.3685 1.0073
plot(ecdf(nns))
The \(\hat{G}\) measure turns out to be just the ECDF of the nearest neighbour distances, plotted by default with the expected CSR line; Gest returns binned values for a range of distance bins best chosen by the function:
plot(ecdf(nns), xlim=c(0, 0.5))
plot(Gest(drumlins_ppp), add=TRUE, lwd=3)
If we generate many simulated CSR point patterns for the current window, we can use the envelope method to explore whether the observed \(\hat{G}\) measures lie in relation to the simulated ones:
n <- drumlins_rr$n
set.seed(121122)
ex <- expression(runifpoint(n, win=rr))
res <- envelope(drumlins_rr, Gest, nsim=99, simulate=ex,
verbose=FALSE, savefuns=TRUE)
plot(res, xlim=c(0,0.7))
for(i in 2:100) lines(attr(res, "simfuns")[[1]], attr(res, "simfuns")[[i]], col="grey")
plot(res, add=TRUE, lwd=3, xlim=c(0,0.7))
We can also compute the nearest neighbour based Clark/Evans R statistic :
clarkevans(drumlins_ppp)
## naive Donnelly cdf
## 1.248991 1.214479 1.229503
clarkevans(drumlins_rr, correction="none")
## [1] 1.238292
clarkevans(drumlins_rr, correction="guard", clipregion=erosion.owin(rr, r=1))
## [1] 1.194691
which seem to indicate that the observed and CSR expected distances are similar, but perhaps more evenly spaced than clustered.
From what we have seen, it appears the the drumlin summit points are more regularly than randomly distributed. If we think, however, the absence of short nearest neighbour distance may mean that they “push” each other apart (in fact this is about points not being a good way of representing ellipses) — so we can try to simulate from a Simple Sequential Inhibition (SSI) process with a 180m inhibition radius instead of CSR:
ex <- expression(rSSI(0.18, n, win=rr))
set.seed(121122)
res <- envelope(drumlins_rr, Gest, nsim=99, simulate=ex,
verbose=FALSE, savefuns=TRUE)
null <- capture.output(plot(res, xlim=c(0,0.7)))
for(i in 2:100) lines(attr(res, "simfuns")[[1]], attr(res, "simfuns")[[i]], col="grey")
null <- capture.output(plot(res, add=TRUE, lwd=3, xlim=c(0,0.7)))
As we know, G-hat uses nearest neighbour distances to express summary features of a point pattern. The K-hat function uses point intensities in rings spreading out from the points, and so uses more of the data to examine what is driving the process (reported here as L-hat):
ex <- expression(runifpoint(n, win=rr))
set.seed(121122)
res <- envelope(drumlins_rr, Kest, nsim=99, simulate=ex,
verbose=FALSE, savefuns=TRUE)
r <- res$r
Lhat <- function(k, r) { (sqrt(k/pi)) - r }
plot(r, Lhat(res$obs, r), type="n", ylab="L(r)", main="CSR simulation", ylim=c(-0.17, 0.1))
for(i in 2:100) lines(r, Lhat(attr(res, "simfuns")[[i]], r), col="grey")
lines(r, Lhat(res$obs, r), lwd=2, col="brown")
lines(r, Lhat(res$lo, r), lwd=2, col="black", lty=2)
lines(r, Lhat(res$hi, r), lwd=2, col="black", lty=2)
From what we already know, drumlins represented as points appear to inhibit each other under a distance of about 200m, so running the \(\hat{K}\) measure with an SSI process should show more of what is going on:
ex <- expression(rSSI(0.18, n, win=rr))
set.seed(121122)
res <- envelope(drumlins_rr, Kest, nsim=99, simulate=ex,
verbose=FALSE, savefuns=TRUE)
r <- res$r
Lhat <- function(k, r) { (sqrt(k/pi)) - r }
plot(r, Lhat(res$obs, r), type="n", ylab="L(r)", main="SSI simulation", ylim=c(-0.17, 0.1))
for(i in 2:100) lines(r, Lhat(attr(res, "simfuns")[[i]], r), col="grey")
lines(r, Lhat(res$obs, r), lwd=2, col="brown")
lines(r, Lhat(res$lo, r), lwd=2, col="black", lty=2)
lines(r, Lhat(res$hi, r), lwd=2, col="black", lty=2)
Another possibility is that the CSR hypothesis is at error on assuming that the process is homogeneous — we may also test against an inhomogeneous process using the Kinhom function. If its lambda argument is omitted, it does leave-one-out kernel smoothing to find \(\lambda_i\) by omitting the \(i\)-th point:
ex <- expression(runifpoint(n, win=rr))
set.seed(121122)
res <- envelope(drumlins_rr, Kinhom, nsim=99, simulate=ex,
verbose=FALSE, savefuns=TRUE)
r <- res$r
Lhat <- function(k, r) { (sqrt(k/pi)) - r }
plot(r, Lhat(res$obs, r), type="n", ylab="L(r)", main="CSR simulation", ylim=c(-0.17, 0.1))
for(i in 2:100) lines(r, Lhat(attr(res, "simfuns")[[i]], r), col="grey")
lines(r, Lhat(res$obs, r), lwd=2, col="brown")
lines(r, Lhat(res$lo, r), lwd=2, col="black", lty=2)
lines(r, Lhat(res$hi, r), lwd=2, col="black", lty=2)
Finally, we round off with an inhomogeneous SSI process:
ex <- expression(rSSI(0.18, n, win=rr))
set.seed(121122)
res <- envelope(drumlins_rr, Kinhom, nsim=99, simulate=ex,
verbose=FALSE, savefuns=TRUE)
r <- res$r
Lhat <- function(k, r) { (sqrt(k/pi)) - r }
plot(r, Lhat(res$obs, r), type="n", ylab="L(r)", main="SSI simulation", ylim=c(-0.17, 0.1))
for(i in 2:100) lines(r, Lhat(attr(res, "simfuns")[[i]], r), col="grey")
lines(r, Lhat(res$obs, r), lwd=2, col="brown")
lines(r, Lhat(res$lo, r), lwd=2, col="black", lty=2)
lines(r, Lhat(res$hi, r), lwd=2, col="black", lty=2)
Comparing the SSI with the CSR results indicates that we can not only reject the CSR process as being that driving drumlin point locations, but we have good grounds to suppose that a spatial inhibition process is operating
It is also very possible that the process is inhomogeneous, that is that an omitted heterogeneity in the surface is influencing the point pattern
The minimum drumlin footprint is such that drumlins cannot be closer to each other than a certain minimum distance
At short range, the estimated L-hat values are lower than the lower envelope bounds, but only less than 0.4 distance units — this corresponds to spatial inhibition
As the L-hat simulation values for the SSI process indicate, drumlins are not well represented by points, because they have area, even volume, and rarely overlap’’
It is frequently the case that point patterns are not measured on homogeneous surfaces
One way to tackle this is, as in these examples from Zhang et al. (2009), to sample from the population at risk to form a control group
We are then interested in examining the spatial distribution of cases compared to the spatial distribution of controls
The cases are observations of schistosomiasis in Guichi, China, and the controls are taken from the polulation at risk
First we’ll read in the points, enclosing polygon, and river banks:
points <- st_read("cc.gpkg", layer="points")
## Reading layer `points' from data source `/home/rsb/und/ecs530/ECS530_h22/cc.gpkg' using driver `GPKG'
## Simple feature collection with 166 features and 1 field
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 0.1299685 ymin: 0.09089491 xmax: 0.9548543 ymax: 0.8590922
## Projected CRS: Undefined Cartesian SRS
rivers <- st_geometry(st_read("cc.gpkg", layer="rivers"))
## Reading layer `rivers' from data source `/home/rsb/und/ecs530/ECS530_h22/cc.gpkg' using driver `GPKG'
## Simple feature collection with 5 features and 0 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -0.1455332 ymin: 0.3004564 xmax: 0.9188697 ymax: 1.181636
## Projected CRS: Undefined Cartesian SRS
poly <- st_geometry(st_read("cc.gpkg", layer="poly"))
## Reading layer `poly' from data source `/home/rsb/und/ecs530/ECS530_h22/cc.gpkg' using driver `GPKG'
## Simple feature collection with 1 feature and 0 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 0 ymin: 0 xmax: 1 ymax: 0.8758322
## Projected CRS: Undefined Cartesian SRS
plot(poly)
plot(rivers, add=TRUE)
plot(st_geometry(points), pch=3:4, add=TRUE)
We coerce the sp objects to their spatstat representations, the points with marks:
points$mark <- factor(points$mark)
points.ppp <- as.ppp(points)
points.ppp$window <- as.owin(poly)
summary(points.ppp)
## Marked planar point pattern: 166 points
## Average intensity 317.9072 points per square unit
##
## Coordinates are given to 8 decimal places
##
## Multitype:
## frequency proportion intensity
## case 83 0.5 158.9536
## control 83 0.5 158.9536
##
## Window: polygonal boundary
## single connected closed polygon with 328 vertices
## enclosing rectangle: [0, 1] x [0, 0.8758322] units
## Window area = 0.522165 square units
## Fraction of frame area: 0.596
plot(split(points.ppp))
We make a mask for the kernel density calculations, and show the overall density:
XY <- st_coordinates(st_sample(poly, size=10000, type="regular"))
XY <- list(x=XY[,1], y=XY[,2])
Z <- density(points.ppp, xy=XY)
plot(Z)
The split density shows how the point patterns differ:
Z <- density(split(points.ppp), xy=XY)
plot(Z)
We can also display the case density as a proportion of the overall density:
pCases <- with(Z, eval.im(case/(case + control)))
plot(pCases)
plot(points.ppp, add=TRUE)